KeyWorld: Extracting Keywords from a Document as a Small World

نویسندگان

Yutaka Matsuo

Yukio Ohsawa

Mitsuru Ishizuka

چکیده

The small world topology is known widespread in biological, social and man-made systems. This paper shows that the small world structure also exists in documents, such as papers. A document is represented by a network; the nodes represent terms, and the edges represent the co-occurrence of terms. This network is shown to have the characteristics of being small world, i.e., highly clustered and short path length. Based on the topology, we develop an indexing system called KeyWorld, which extract important terms by measuring their contribution to the graph being small world.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm

Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...

متن کامل

Extraction of Representative Keywords Considering Co-occurrence in Positive Documents

In linear text classification, user feedback is usually used to tune up the representative keywords (RK) for a certain class. Despite some algorithms (e.g. Rocchio) deal well with user positive and negative feedback to adjust the RKs, few researches have investigated how to adjust RKs only based on a small positive responses which is a popular case in the real-world application (e.g. users tend...

متن کامل

A Model for Extracting Keywords of Document Using Term Frequency and Distribution

In information retrieval systems, it is very important that indexing is defined very well by appropriate terms about documents. In this paper, we propose a simple retrieval model based on terms distribution characteristics besides term frequency in documents. We define the keywords distribution characteristics using a statistics, standard deviation. We can extract document keywords that term fr...

متن کامل

Keyword Extraction from a Single Document Using Centrality Measures

Keywords characterize the topics discussed in a document. Extracting a small set of keywords from a single document is an important problem in text mining. We propose a hybrid structural and statistical approach to extract keywords. We represent the given document as an undirected graph, whose vertices are words in the document and the edges are labeled with a dissimilarity measure between two ...

متن کامل

Design and Analysis of the Performance of Clustering of Conversation Documents Based on Keyword Extraction Mechanism

Now a day’s data mining has become one of the most fascinating domains in each and every field like medical, shopping, business, MNC companies, information technology and a lot more. As we all know that the main goal of data mining is to extract the valuable information from large data sets, in order to retrieve the desired result as an output. In this thesis we mainly try to extract the large ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

KeyWorld: Extracting Keywords from a Document as a Small World

نویسندگان

چکیده

منابع مشابه

Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm

Extraction of Representative Keywords Considering Co-occurrence in Positive Documents

A Model for Extracting Keywords of Document Using Term Frequency and Distribution

Keyword Extraction from a Single Document Using Centrality Measures

Design and Analysis of the Performance of Clustering of Conversation Documents Based on Keyword Extraction Mechanism

عنوان ژورنال:

اشتراک گذاری